---
title: Replication Material, Mass Media and Electoral Preferences during the 2016 Presidential
  Race
output:
  pdf_document: default
  html_notebook: default
---

###Forthcoming in Political Behavior

####Christopher Wlezien and Stuart Soroka 

June 12, 2018

The following scripts replicate analyses in "Mass Media and Electoral Preference during the 2016 Presidential Race," by Christopher Wlezien and Stuart Soroka, forthcoming in Political Behavior.

Note that all analyses for the paper were run in STATA, while data management and graphics were done using R.  The follwing script thus uses the rStata package to run STATA analyses in batch mode from R. For STATA users, the STATA scripts can also be easily pasted directly into STATA, however.

This script is distributed with three datasets:

1. rawpolldata2016.dta: Raw polling data, used to estimate house effects and in Figure 1.
2. aggregatedpolls2016.dta: Aggregated polling data, used in Figures 1 and 2.
3. analysis2016.dta: The final working dataset, including final polling data, media tone data, and other variables, used in Figures 3 and 4, and all tables.

Further description of each dataset follows.  Note that to run anaylses, STATA users must load the appropriate dataset, rawpolldata2016.dta to estimate house effects, and analysis2016.dta for all tables. Setup for R (including running STATA in batch mode) is as follows:

```{r setup}

library(foreign)
library(RStata)
library(DataCombine)
library(knitr)
library(tinytex)

#set path for STATA
options("RStata.StataPath" = '/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp')
options("RStata.StataVersion" = 15)

#set working directory - CHANGE TO CURRENT WORKING DIRECTORY
wd <- "/Users/stuartsoroka/Desktop/pb" 
setwd(wd)

#load datasets
R <- read.dta("rawpolldata2016.dta")
A <- read.dta("aggregatedpolls2016.dta")
N <- read.dta("analysis2016.dta")

```

The dependent variables in the article are based on pre-election polls of vote intentions and also the tone of media coverage in nine newspapers, as described in the text. 

There are 308 polls used in the analysis, results and descriptive information for which is contained in the Stata dataset “rawpolldata2016.dta”. In the data set are the following variables:

1. pollster_name: the actual name of the polling organization, e.g., CNN
2. startdate: the number of days before Election Day that the survey began 
3. enddate: the number of days before Election Day that the survey ended
4. mediandate: the middle date of the polling period in terms of the number of days before Election Day, rounding down (toward Election Day) for surveys in the field an even number of days.
5. clinton: Hilary Clinton’s poll share
6. trump: Donald Trump’s poll share	
7. other: other candidates’ poll shares combined
8. undecided: share of respondents who said they were undecided	
9. population: survey population, where 1=all adults, 2=registered voters, and 3=likely voters
10. respondents:	total number of respondents
11. clintonN :the clinton share variable multiplied by the number of respondents divided by 100
12. trumpN: the trump share variable multiplied by the number of respondents divided by 100
13. pollster_code: a code for each of the 39 separate polling organizations
14. pollster_good5: a code for the 22 pollsters who fielded 5 or more separate vote intention polls during 2016; all other polling organizations are coded “0”
15. clinton2p: Hilary Clinton’s two-party poll share
16. daysinfield: the number of days a polls was in the field
17. houseclinton2p:	Hilary’s Clinton’s house-adjusted two-party poll share, centering on the median polling organization, CNN

All but the last variable is based on data provided by the Huffington Post.  That last variable, houseclinton2p, is generated based on an analysis of variance (ANOVA) that is summarized in Table A1 of the article.  The code to estimate the equation using rawpolldata2016.dta is as follows:

```{r house effects}

stata_src <- "
anova clinton2p mediandate pollster_good5 population [aweight=respondents]
reg"
stata(stata_src,data.in=R)

```

The houseclinton2p variable (included in the rawpolldata2016.dta dataset) is then Clinton’s two-party polls share (clinton2p) with the different estimated house effects subtracted out for each separate organization and the house effect (-1.61739) for the median pollster (CNN, pollster #21) added in to all 308 polls.

To aggregate the house-adjusted polls for plotting in Figure 1, the houseclinton2p variable was divided by 100 and then multiplied by the sum of clintonN and trumpN to produce a house-adjusted Clinton N, and an associated Trump N variable was calculated by subtracting the house-adjusted Clinton N from the sum of clintonN and trumpN.   These Ns from polls centered on each day were summed and a corresponding Clinton share calculated.  The resulting estimates (aggregatedclinton2p) are contained in the Stata dataset “aggregatedpolls2016.dta”.

To create our main vote intentions variable, we “pool” the polls by each day a survey is in the field.  For this the house-adjusted Clinton and Trump Ns from each poll are allocated equally across days based on the number of days a survey is in the field, e.g., dividing by 3 for a survey in the field for 3 days.  The N’s on each day then were summed and the Clinton share calculated.  The resulting estimates (pooledclinton2p) also are contained in the Stata data set “aggregatedpolls2016.dta”.

####Figure 1:

```{r figure 1}

trend1 <- A[,c("til","aggregatedclinton2p")]
trend1 <- trend1[!is.na(trend1$aggregatedclinton2p),]
trend2 <- A[,c("til","pooledclinton2p")]
trend2 <- trend2[!is.na(trend2$pooledclinton2p),]

{
pdf("figure1.pdf",width=10,height=8)
plot(R$til,R$clinton2p,type="p",axes=F,ann=F,col="gray50",ylim=c(45,61))
lines(trend1$til,trend1$aggregatedclinton2p,col="gray70",lwd=3)
lines(trend2$til,trend2$pooledclinton2p,col="black",lwd=4)
axis(1)
axis(2,las=1)
title(xlab="Days Until Election",ylab="Clinton 2-Party Vote Share")
text(-300,47,"Trend, Adjusted for House and Design Effects",pos=4,col="gray70")
text(-300,46.4,"Trend, Adjusted for House and Design Effects, and with Daily Weighting",
     pos=4,col="black")
abline(v=(-125),col="gray",lty=3) ; 
text(-125-2,61,"Comey Announcement",srt=90,cex=.8,adj = c(1,0))
abline(v=(-112),col="gray",lty=3)
text(-112-2,61,"Rep Convention",srt=90,cex=.8,adj = c(1,0))
abline(v=(-105),col="gray",lty=3)
text(-105-2,61,"Dem Convention",srt=90,cex=.8,adj = c(1,0))
abline(v=(-42),col="gray",lty=3)
text(-42-2,61,"1st Debate",srt=90,cex=.8,adj = c(1,0))
abline(v=(-31),col="gray",lty=3)
text(-31-2,61,"Sex Tape",srt=90,cex=.8,adj = c(1,0))
abline(v=(-10),col="gray",lty=3)
text(-10-2,61,"Comey Letter",srt=90,cex=.8,adj = c(1,0))
invisible(dev.off())
}

include_graphics("figure1.pdf")

```

####Figure 2:

```{r figure 2}

trend2 <- A[,c("til","pooledclinton2p")]
trend2 <- trend2[!is.na(trend2$pooledclinton2p),]

{
pdf("figure2.pdf",width=10,height=8)
plot(trend2$til[trend2$til>(-64)],trend2$pooledclinton2p[trend2$til>(-64)]
     ,type="l",axes=F,ann=F,col="black",lwd=4,ylim=c(45,61))
axis(1)
axis(2,las=1)
title(xlab="Days Until Election",ylab="Clinton 2-Party Vote Share")
text(-60,46.4,"Trend, Adjusted for House and Design Effects, and with Daily Weighting",
     pos=4,col="black")
abline(v=(-42),col="gray",lty=3)
text(-42-1,61,"1st Debate",srt=90,cex=.8,adj = c(1,0))
abline(v=(-31),col="gray",lty=3)
text(-31-1,61,"Sex Tape",srt=90,cex=.8,adj = c(1,0))
abline(v=(-10),col="gray",lty=3)
text(-10-1,61,"Comey Letter",srt=90,cex=.8,adj = c(1,0))
invisible(dev.off())
}

include_graphics("figure2.pdf")

```

The primary media variable used in the analysis is the net Clinton minus Trump tone of coverage (clintonminustrumptone), which is included in the Stata data set for all of the tables in the paper “analysis2016.dta”. Details about the media measure are in the paper. 

####Figure 3:

```{r figure 3}

N$sm <- (N$clintonminustrumptone + shift(N$clintonminustrumptone,-1,reminder=FALSE) +
           shift(N$clintonminustrumptone,-2,reminder=FALSE) ) /3
trend3 <- N[,c("til","sm")]
trend3 <- trend3[!is.na(trend3$sm),]

{
pdf("figure3.pdf",width=10,height=8)
plot(N$til[N$til<0],N$clintonminustrumptone[N$til<0],type="p",axes=F,ann=F,
     col="gray50",ylim=c(-.4,.5))
lines(trend3$til[trend3$til<0],trend3$sm[trend3$til<0],col="black",lwd=4)
axis(1)
axis(2,las=1)
title(xlab="Days Until Election",ylab="Net Clinton minus Trump Media Tone")
text(-300,-.35,"Trend, 3-day Rolling Average, News Weighted by Circulation",
     pos=4,col="black")
abline(v=(-125),col="gray",lty=3) ; 
text(-125-2,.5,"Comey Announcement",srt=90,cex=.8,adj = c(1,0))
abline(v=(-112),col="gray",lty=3)
text(-112-2,.5,"Rep Convention",srt=90,cex=.8,adj = c(1,0))
abline(v=(-105),col="gray",lty=3)
text(-105-2,.5,"Dem Convention",srt=90,cex=.8,adj = c(1,0))
abline(v=(-42),col="gray",lty=3)
text(-42-2,.5,"1st Debate",srt=90,cex=.8,adj = c(1,0))
abline(v=(-31),col="gray",lty=3)
text(-31-2,.5,"Sex Tape",srt=90,cex=.8,adj = c(1,0))
abline(v=(-10),col="gray",lty=3)
text(-10-2,.5,"Comey Letter",srt=90,cex=.8,adj = c(1,0))
invisible(dev.off())
}

include_graphics("figure3.pdf")

```

####Figure 4:

```{r figure 4}

N$sm <- (N$clintonminustrumptone + shift(N$clintonminustrumptone,-1,reminder=FALSE) +
           shift(N$clintonminustrumptone,-2,reminder=FALSE) ) /3
trend3 <- N[,c("til","sm")]
trend3 <- trend3[!is.na(trend3$sm),]

{
pdf("figure4.pdf",width=10,height=8)
plot(trend3$til[trend3$til>(-64)],trend3$sm[trend3$til>(-64)],type="l",axes=F,ann=F,
     col="black",ylim=c(-.4,.5),lwd=4)
axis(1)
axis(2,las=1)
title(xlab="Days Until Election",ylab="Net Clinton minus Trump Media Tone")
text(-60,-.35,"Trend, 3-day Rolling Average, News Weighted by Circulation",pos=4,
     col="black")
abline(v=(-42),col="gray",lty=3)
text(-42-1,.5,"1st Debate",srt=90,cex=.8,adj = c(1,0))
abline(v=(-31),col="gray",lty=3)
text(-31-1,.5,"Sex Tape",srt=90,cex=.8,adj = c(1,0))
abline(v=(-10),col="gray",lty=3)
text(-10-1,.5,"Comey Letter",srt=90,cex=.8,adj = c(1,0))
invisible(dev.off())
}

include_graphics("figure4.pdf")

```

The “analysis2016.dta” data includes all variables necessary to produce all of the tables.  The data are sorted by mediandate, and ranges from date 309 at the beginning of the election year to date 1, which is the day before the election.  Besides the pooled polls and net media tone variables, there are variables capturing specific events: the first Comey intrusion (comey1), the Republican convention (gop), the Democratic convention (dem), the first debate (debate1), the sex tapes revelation (tapes), and the second Comey intrusion (comey2). 

Analyses, by table, estimated in STATA, are as follows.

####Table 1:

```{r Table 1}

stata_src <- "
corrgram pooledclinton2p if til>-201, lags(10)
corrgram pooledclinton2p if til>-64, lags(10)"
stata(stata_src,data.in=N)

```

####Table 2:

```{r Table 2}

stata_src <- "
corrgram clintonminustrumptone if til>-201, lags(10)
corrgram clintonminustrumptone if til>-64, lags(10)"
stata(stata_src,data.in=N)

```

####Table 3:

```{r Table 3}

stata_src <- "
tsset til
regr clintonminustrumptone l.clintonminustrumptone l.pooledclinton2p ///
if til>-201
regr pooledclinton2p l.clintonminustrumptone l.pooledclinton2p ///
if til>-201"
stata(stata_src,data.in=N)

```

####Table 4:

```{r Table 4}

stata_src <- "
tsset til
regr clintonminustrumptone l.clintonminustrumptone l.gop l.dem l.comey1 ///
l.debate1 l.tapes l.comey2 if til>-201
regr pooledclinton2p l.pooledclinton2p l.gop dem l2.comey1 l2.debate1 ///
l.tapes l.comey2 if til>-201"
stata(stata_src,data.in=N)

```

####Table 5:

```{r Table 5}

stata_src <- "
tsset til
regr clintonminustrumptone l.clintonminustrumptone l.pooledclinton2p ///
l.gop l.dem l.comey1 l.debate1 l.tapes l.comey2 if til>-201
regr pooledclinton2p l.pooledclinton2p l.clintonminustrumptone l.gop ///
dem l2.comey1 l2.debate1 l.tapes l.comey2 if til>-201"
stata(stata_src,data.in=N)

```

####Table 6:

```{r Table 6}

stata_src <- "
xcorr clintonminustrumptone pooledclinton2p if til>-201, lags(10) tab
xcorr clintonminustrumptone pooledclinton2p if til>-64, lags(10) tab"
stata(stata_src,data.in=N)

```